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ABSTRACT 

We use model selection forecasting to assess the ability of the Planck satellite to make a 
positive detection of spectral index running. We simulate Planck data for a range of assumed 
cosmological parameter values, and carry out a three-way Bayesian model comparison of a 
Harrison-Zel'dovich model, a power-law model, and a model including running. We find that 
Planck will be able to strongly support running only if its true value satisfies \dn/d In k \ > 
0.02. 
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Results from the Wilkinson Microwave Anisotropy Probe 
(WMAP), especially the first-yea r data (|Spergel et al~20 03) and to 
some extent the three-year data ( Spergel et al. 2007) , have placed 
a focus on possible running of the spectral index of density pertur- 
bations (see e.g. Lidsey & Tavakol 2003; Kawasaki, Yamaguchi 
& Yokoyama 2003; Chung, Shiu & Trodden 2003; Bastero-Gil, 
Freese & Mersini-Houghton 2003; Chen et al. 2004; Covi et 
al. 2004; Ashoorioon, Hovdebo & Mann 2005; Ballesteros, Casas 
& Espinosa 2006; Cline & Hoi 2006; Cortes & Liddle 2006; Eas- 
ther & Peiris 2006). It is certainly premature to draw any strong 
conclusions based on existing evidence, especially as it remains 
controversial whether current data even support power-law mod- 
els over the Harrison-Zel'dovich (HZ) model, but it is timely to 
investigate the extent to which the upcoming Planck satellite may 
resolve the situation. 

As we have stressed in several recent papers (e.g. Mukherjee, 
Parkinson & Liddle 2006a; Parkinson, Mukherjee & Liddle 2006; 
Liddle, Mukherjee & Parkinson 2006a), the appropriate statistical 
tool for assessing the need to introduce new parameters is model 
selection Jeffreys 196T]|MacKay 2003[|Gregory 2005) . Model se- 
lection assigns probabilities to sets of parameters, i.e. models, in 
addition to the usual probability distributions for parameter values 
within each model. For example, Bayesian model selection applied 
to data compilations including WMAP3 shows that the case for in- 
cluding even just the spectral index ns as a variable fit parameter is 
inconclusive (Parkinso n et al. 2006t . 

In a recent paper (Pahud et al. 20061, we used model selec- 
tion forecasting tools to assess the ability of the Planck satellite to 
distinguish between the Harrison-Zel'dovich model with ns = 1 
and a model with varying spectral index, VARYn. The outcome 
naturally depends on the assumed true value of ns, which we call 
the fiducial value, and we found that Planck can strongly favour 
the latter model only if the true value of ns lies outside the range 
[0.986, 1.014]. In making that comparison, we assumed that the 
true spectrum could be described by a power-law. 



In this paper, we extend that analysis to include the possibil- 
ity of spectral index running, given by a = dn/dlnk. This adds 
an extra model, VARYna, to the model set. This means that we 
are carrying out a three-way model comparison, within the two- 
dimensional space defined by the fiducial values of ns and a. Ide- 
ally we would also have included tensor perturbations in this anal- 
ysis in order to fully represent the usual inflationary predictions 
(e.g. Liddle & Lyth 2000), but the present analysis is at the limits 
of current computer power, having required many months of multi- 
processor time. 



2 MODEL SELECTION FORECASTS FOR MODELS 
WITH RUNNING 

2.1 Model selection forecasting 

Our approach exactly follows our earlier paper (Pahud et al. 2006 ), 
and so we provide only the briefest of summaries here and refer 
to that paper and references therein for details. Model selection 
forecasting was first introduced by Trotta (2007b), whose Predic- 
tive Posterior Odds Distribution (PPOD) forecasting determined 
the probability of different model selection outcomes of future ex- 
periments based on present knowledge. An alternative approach, 
which delineates regions of parameter space where different model 
selection verdicts are expected, was introduced in Mukherjee et 
al. (2006b); a combination of the methods was used in Pahud 
et al. (2006), and also subsequently in Trotta (2006), Liddle et 
al. (2006b) and Trotta (2007a). As used here, data are simulated 
for the different possible true values of the parameters of interest, 
known as fiducial values, and a model comparison analysis carried 
out at each point. 

Although not required, in typical examples a simple model 
will be nested within a more complex one, e.g. the HZ model is the 
special case of VARYn with ns = 1. If the (assumed) true model is 
the nested one, the model comparison will favour that model, and 
one may ask how strongly. If instead the true model is the more 
complex one, one can ask how far from the simple model the true 
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values would have to be, in order that a given experiment can over- 
come statistical uncertainty and deliver a strong or decisive verdict 
in favour of the complex model. These two notions can be used to 
define model selection Figures-of-Merit, assessing the abilities of 
competing experiments (Mukherjee et al. 2006b). 

In our work, we use the Bayesian evidence E as the model se- 
lection statistic. Like any model selection statistic, it creates a ten- 
sion between goodness of fit to the data and the complexity of the 
model. It represents a full implementation of Bayesian inference, 
being the probability of the data given the model (i.e. the model 
likelihood). It updates the prior model probability to the posterior 
model probability. Computations are carried out using the nested 
sampling algorithm jSkilling 2006| >, using our code COSMONESlQ 
(Mukherjee et al. 2006a , Parkinson et al. 20061. Computing the ev- 
idence accurately is significantly more challenging than computing 
the posterior probability distribution, and so the calculations are 
computationally time-consuming. 

Our assumption is that there are three models of interest in fit- 
ting future Planck data. These are the Harrison-Zel'dovich model, 
a power-law model where ns is fit from data, and a model where 
both ns and a are varied. We denote these models HZ, VARYn, 
and VARYna respectively, and also indicate them by use of sub- 
scripts 0, 1, and 2 respectively. 

In the presence of running, the spectral index is defined in the 
usual way by 

k 

ns(fc) = ns(fco) + a In — . (1) 
fe 

The pivot scale ko = 0.05 Mpc -1 corresponds to a scale well con- 
strained by existing data. When running is included, ns is always 
specified at this scale, and throughout we assume the running is 
constant. As in Pahud et al. (2006), the prior range for ns is taken 
to be 0.8 < ns < 1.2, representing a reasonable range allowed by 
slow-roll inflation models (see e.g. Liddle & Lyth 2000). 

We take the prior on a to be —0.1 < a < 0.1. This is 
somewhat arbitrary. Slow-roll inflation models would tend to sug- 
gest a much smaller value (Kosowsky & Turner 1995 1, but there is 
no point in restricting the analysis to values smaller than Planck 
can measure, as one will simply conclude that Planck is unable 
to make the measurement. Accordingly, our range is loosely moti- 
vated by present observational knowledge, corresponding to mod- 
els with unexpectedly large running. The comparison between two 
models does have some prior dependence on the extra parameter(s). 
If one prior is widened in regions where the likelihood is negligi- 
ble, then the evidence changes proportional to the prior volume, so 
for instance a doubling of the prior range will only reduce In E by 
In 2 = 0.69. 

In running CosmoNest, the algorithm parameters used were 
N = 300 live points and an enlargement factor of 1.8 for HZ, 1.9 
for VARYn, and 2.0 for VARYna. The tolerance parameter was 
set to 20 (rather than 0.5 as in our previous analysis) in order to 
improve the speed of the simulations. This is sufficient to give an- 
swers to good accuracy as indicated by the uncertainties obtained. 
Four independent evidence evaluations were done for each calcula- 
tion, to obtain the mean and its standard error. 

We then compare our models in pairs by considering the Bayes 
factor, defined as the ratio of evidences between two models, writ- 
ten B t] = E(Mi)/E(Mj), for i,j = 0, 1,2 (i / j), where 
Mi and Mj indicate the two models under assumption. By plot- 



1 Available at http://cosmonest.org 
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Figure 1. The logarithm of the Bayes factor, In Bni, as a function of the 
fiducial value of rag. The horizontal lines indicate where the comparison 
becomes 'strong' (dashed) and 'decisive' (solid) on the Jeffreys' scale. 

ting the Bayes factor using datasets generated as a function of the 
two parameters of interest, one uncovers the regions of the two- 
dimensional fiducial parameter space in which the Planck satellite 
would be able to decisively select between the two models, and also 
those regions where the comparison would be inconclusive. 

In order to assess the significance of any difference in evi- 
dence between two models, a useful guide is given by the Jeffreys' 
scale I Jeff reys" 196 1[ >. Labelling as Mi the model with the higher 
evidence, it rates In Bij < 1 as 'not worth more than a bare men- 
tion', 1 < In Bij < 2.5 as 'substantial', 2.5 < In Bij < 5 'strong' 
to 'very strong', and 5 < In Bij as 'decisive'. 

2.2 Simulating Planck data 

We simulate Planck data exactly as described in Pahud et al. (2006). 
Having determined the fiducial model power spectra, we simulate 
temperature power spectrum data for the three most sensitive High 
Frequency Instrument (HFI) channels and the polarization signal 
for only one of these channels, modelling instrument noise using 
current detector specifications. The simulations are somewhat sim- 
plistic, as computational limitations prevent a more detailed treat- 
ment that might include residuals from foreground subtraction and 
1// noise. However they should provide a good characterization 
of the Planck data for our purposes. Simulations are carried out for 
various values of the spectral index and its running, and the other 
parameters are those of the usual ACDM model in a flat spatial 
geometry. 

In simulating the data, we are primarily interested in the de- 
pendence on the key parameters of interest, ns and a, and dif- 
ferent data simulations are carried out for a grid of values in that 
plane. The other cosmological parameters are given fixed fidu- 
cial values as in Pahud et al. (2006), namely the baryon physi- 
cal density Qhh 2 = 0.024, the cold dark matter physical density 
Q c h 2 = 0.103, the sound horizon 9 = 1.047, the optical depth 
r = 0.14, and the density perturbation amplitude normalization 
As — 2.3 x 10 -9 . The corresponding value of the Hubble param- 
eter is h = 0.78. The model selection verdict should have negligi- 
ble dependence on these fiducial values. Note that all parameters, 
including these, are varied in computing the evidences of the mod- 
els; it is only in defining the fiducial models for data simulation 
that these parameters are fixed. The prior ranges used for these pa- 
rameters are as in Pahud et al. (2006): 0.018 < Q, h h 2 < 0.032, 
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Figure 2. The logarithm of the Bayes factors, In Bq\ in the left panel, In f?o2 in the centre, and In B\2 in the right, as a function of the fiducial values of ng 
and a. The contour lines represent different steps in the Jeffreys' scale. From the plot centres, the levels are 2.5, 0, -2.5, -5 in the left and right panels, with the 
centre panel contours starting at 5. 



0.04 < D. c h 2 < 0.16, 0.9c 
2.6 < ln(A s x 10 10 ) < 4.2. 



3 RESULTS 



< 9 < 1.1, < t < 0.5, and 



We begin by showing in Fig. Q] the main result obtained in Pahud 
et al. (2006). In that analysis, running was not included and so the 
fiducial a is zero. At ns = 1, corresponding to HZ being the true 
model, the HZ model is strongly preferred with In Boi = 3.6±0.1. 
It has a higher evidence since it can fit the data just as well as 
VARYn and has one less parameter. Once ns is far enough away 
from 1, the HZ fit becomes very poor and the Bayes factor plum- 
mets. The speed with which this happens indicates the strength 
of the experiment. The VARYn model becomes strongly favoured 
only once ns < 0.986 or ns > 1.014; if the true value lies within 
that range even the Planck satellite will give inconclusive results. 

Figure [2] shows the extension of our results into the a-ns 
plane, now showing the three-way model comparison. The left plot 
still shows the comparison between HZ and VARYn, though nei- 
ther is the true model except at a = (Fig.Q]is the cross-section 
of this plot at a = 0). The plot is not surprising in the sense that 
the logarithm of the Bayes factor is roughly independent of a. The 
models HZ and VARYn are just as bad at describing a non-zero 
running. However, a slight tilt of the contours appears when a goes 
away from zero. This indicates that a positive (resp. negative) run- 
ning can be balanced by a scalar index smaller (resp. bigger) than 
1, accordingly to equation QJ. This can benefit HZ or VARYn, de- 
pending whether it helps or hinders the HZ model to fit the data. 
In fact the effect just reflects that the scale ko is not quite at the 
statistical centre of the data, so that the determination of ns and a 
has some correlations, and could be removed by judicious choice 
of the 'pivot' scale (Cortes, Liddle & Mukherjee 2007). 

The centre panel now introduces a comparison of HZ with 
VARYna, which is the true model in most of the parameter 
plane. At [a, ns]=[0,l], the HZ model is decisively preferred with 
In B02 = 6.3 ± 0.1. Its higher evidence arises since it can fit the 
data just as well as VARYna, but has two less parameters. Once 
the fiducial point in the two-dimensional space is far enough away 
from the centre, the HZ fit becomes very poor and VARYna model 
becomes favoured. 

Being the true model, VARYna can simply adapt its two extra 
free parameters to fit the data at every point of the fiducial space 




a 

Figure 3. The logarithm of the Bayes factor, lnBi2, as a function of the 
fiducial value of a. The horizontal lines indicate where the comparison be- 
comes 'strong' (dashed) and 'decisive' (solid) on the Jeffreys' scale. 



equivalently, thus leading to the same evidence. We have verified 
this holds to excellent accuracy in our simulations. The behaviour 
of the Bayes factor should therefore be approximately symmetrical 
with respect to ns = 1 and to a — 0. However, it is clearly not 
quite the case, for the same reason as the presence of the tilt in the 
left panel. The influence of the correlation between the two fiducial 
parameters is greater this time, as it acts on HZ only. 

Finally, we need to consider a comparison between the mod- 
els VARYn and VARYna, which is illustrated in the right panel 
of Fig. [2] This plot is fully determined by the above results, as by 
definition In B12 = In B02 — In £?oi • Moreover, for the same rea- 
son that the evidence of VARYna is independent of both fiducial 
parameters ns and a, VARYn turns out to be independent of ns. 
This allows us to restrict our analysis to one dimension only, shown 
in Fig. [3] At a = the VARYn model is strongly preferred over 
VARYna as lni?i2 = 2.7 ± 0.1, having one less parameter. The 
running model becomes strongly favoured only if the true running 
satisfies |a| > 0.02. 

In Fig. [4] we display the full three-way model comparison in 
two different ways. The three-model case is perfectly adapted to 
display by false-colour RGB plot, where the intensity of each of 
the three red-green-blue colour channels at a given fiducial point 
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Figure 4. Two graphical representations of the three-way model compari- 
son. The upper panel is a false-colour RGB plot with the probabilities of 
HZ, VARYn, and VARYna: assigned to the red, green, and blue channels 
respectively. The lower panel simply shows the model which would receive 
the highest model probability at each point in the fiducial parameter space, 
with those three models allocated white, grey, and black respectively. 



is assigned as the posterior model probability, given by Bayes' the- 
orem, 



P, =P(Mi\D) 



P{D\Mi)P(Mi) 
J2i P(D\Mj)P{Mj 



(2) 



Here we assume that the prior model probabilities P(Mi) are equal 
(an assumption readily varied if required), so the equation simpli- 
fies to 



Pi = 



E(Mi 



(3) 



That the total probability sums to one corresponds to fixed total 
intensity. This is shown in the upper panel. The region which ap- 
pears red would lead to the HZ model being preferred, green the 
VARYn model, and blue the VARYna model. Between those, re- 
gions which interpolate into secondary colours share their proba- 
bility between the different models. There are also four 'vertices' 



at which all three models have the same probability. We see that the 
transitions between the different domains are rather rapid in terms 
of the shifting model probabilities. 

The lower plot shows a much simpler representation, where 
regions are shaded simply according to the dominant model proba- 
bility in that region. 

These two plots affirm the results already apparent from the 
earlier figures; for Planck to be able to demonstrate that ns 7^ 1, 
the true value will have to be more than 0.01 away from unity 
( Pahud et al. 2006), and for running to be convincingly detected | a | 
will need to be at least 0.02. 



4 CONCLUSIONS 



According to WMAP3 analyses ( Spergel et al. 2007 ), the running 
is presently constrained, at 95% confidence, to be in the range of 
approximately —0.17 < a < +0.01. The precise constraints de- 
pend on both on the dataset combination used and the model as- 
sumptions made (e.g. whether or not to include tensor perturba- 
tions), and we have simply quoted the broadest available. Although 
the range is highly skewed to negative values, the special status of 
q = 0, and the prediction from slow-roll inflation for an a value 
that current experiments cannot distinguish from zero, means that 
from a model selection point of view a = should still be regarded 
as a very plausible interpretation of the data. 

Given this inconclusive position, we have addressed the ex- 
tent to which the Planck satellite is likely to resolve the situation, 
using model selection tools to compare three models: Harrison- 
Zel'dovich (HZ), power-law initial perturbations (VARYn), and 
the running model (VARYna). The expected outcome depends, of 
course, on which (if any) of these models proves to be the correct 
one. 

Supposing first that HZ is the true model, we found in Pahud 
et al. (2006) that VARYn would be strongly, though not decisively, 
disfavoured after Planck. The present paper adds the new informa- 
tion that the running model would be decisively disfavoured in this 
circumstance. 

Suppose instead that VARYn is true. Then VARYn will be 
strongly, but not decisively, preferred over VARYna. However, as 
shown in Pahud et al. (2006), the true value of ns has to be suffi- 
ciently far from one in order for VARYn to be favoured over HZ. 
Depending on the true parameter values, all three models may sur- 
vive application of Planck data. 

Finally, suppose VARYna is true. The alternatives will only 
be decisively ruled out provided the true value satisfies |a| > 0.02, 
otherwise the outcome will again be indecisive. The conclusion is 
that Planck will improve knowledge as compared to WMAP3, by 
a factor of around four (our calculations indicate a projected pa- 
rameter uncertainty from Planck of about ±0.007 on a, to be com- 
pared with the current ±0.03 from WMAP3 alone), and thus does 
have the capability to convincingly detect running if it is promi- 
nent. However, it does not have the accuracy to probe into the re- 
gion where slow-roll inflation models typically lie (Kosowsky & 
Turner 1995). 

Our analysis refers to Planck satellite data alone, and, as with 
WMAP3, one would expect some further tightening with incorpo- 
ration of other datasets probing different length scales. 

As with any Bayesian analysis, the results have some depen- 
dence on prior assumptions. For the priors we have chosen on ns 
and a, the data are able to constrain the likelihood well within them. 
Consequently, any change in prior ranges that continues to respect 
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this will just change the evidences according to the change in vol- 
ume, an effect one can readily calculate. Bearing in mind that the 
Jeffreys' scale is logarithmic, a sizeable change in prior parameter 
ranges would be needed to significantly alter the conclusions. 
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